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As we experience a temporal flux of events our expectations of future events change. Such expectations seem 
to be central to our perception of affect in music, but we have little understanding of how expectations 
change as recent information is integrated. When music establishes a pitch centre (tonality), we rapidly learn 
to anticipate its continuation. What happens when anticipations are challenged by new events? Here we 
show that providing a melodic challenge to an established tonality leads to progressive changes in the impact 
of the features of the stimulus on listeners' expectations. The results demonstrate that retrospective analysis 
of recent events can establish new patterns of expectation that converge towards probabilistic 
interpretations of the temporal stream. These studies point to wider applications of understanding the 
impact of information flow on future prediction and its behavioural utility. 



We choose to study music to understand the real-time modification of temporal prediction because of its 
basis in well-learned stylistic regularities and its dynamic unfolding^ l Statistical analyses of musical 
repertoires reveal more or less probable transitions from note(s) to note(s) within a given style, and 
these probabilities have been used successfully in computational models to predict the melodic expectations of 
listeners e.g.^, including the strong influence of pitch proximity e.g.^'^. Western music typically comprises both 
melody (the single voice tune) and harmony (the multiple voice progression of chords which accompany it). The 
perceptual processing of probable events in music is facilitated^ ^ so the formation of accurate expectations is 
behaviourally beneficial. 

There has been little consideration of how expectations change through time. A given musical event necessarily 
serves a dual function as the fulfilment or violation of previous expectations, and towards establishment of future 
expectations^. Huron^ developed the 'ITPRA' theory of expectation in relation to music. The 'Imagination' and 
'Tension' responses occur prior to a musical event, while the 'Prediction', 'Reaction' and 'Appraisal' responses 
occur thereafter. Correct predictions are rewarded and incorrect ones penalised by the limbic system. Appraisal, 
which involves a cognitive assessment of the whole context, has been little investigated. Musical appraisal is an 
example of processes that are significant in the assessment of any event stream of biological relevance. We refer 
here to 'retrospection' to address a component of appraisal that, in the current experiment, deals with musical 
tonality (pitch- centeredness). 

Models of tonality assume the importance of a recency effect in music listening: the most recent musical events 
in a sequence influence expectations more than prior events, which decay in sensory memory. Estimates of the 
rate of exponential sensory decay range from a one- to a four-second half-life^°, but this may also involve the 
influence of intervening events ^\ Thus expectations may derive from long-term knowledge acquired through 
prior exposure to the statistical properties of music within a given tradition, and also from short-term memory of 
hearing the recent distribution of notes^^. In sum, expectations may be sensory or cognitive, stimulus -driven or a 
result of musical acculturation^'^^. 

We assessed whether the influence of the passage of time upon expectations can be demonstrated empirically 
by defining conditions in which both expectation and retrospective appraisal contribute in musical processing. 
Listeners rated the goodness of fit of a probe tone (a single pitch) to a preceding melodic context, and by 
manipulating the time delay of the probe presentation we tested the robustness of expectations established by 
the context and the evidence for retrospective appraisal or sensory memory consolidation (such as decay or 
recoding^^). We interposed silences, as often occur in music, between the melodic context and the probe tone^^ 
We probed listeners' expectations at delays ranging from 0 to 19.2 seconds, to capture any shifts in expectedness 
resulting from reappraisal or memory consolidation. One previous musical study interposed silences of 7.07 s to 
13.3 s between prime and probe, and found that even across such an extensive span, the proximity in pitch of the 
last note of the prime to the subsequent probe significantly facilitated its perception^^. Other studies of musical 
expectation more typically separate context from probe tone by only —Is e.g.^^ 
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Figure 1 | The 'More Unstable' stimulus. Sample stimulus comprising 4- 
part chords in F# major (first dotted box) with an arpeggio continuation 
outlining a new tonal centre in G major (second dotted box). This is more 
unstable as the final arpeggio has a mean information content (based on 
IDyOM) of 5.64. A piano timbre was used for stimulus presentation. Note 
that both stimuli were heard starting in all twelve possible keys. In both 
figures, accidentals apply only to the note that immediately follows; in this 
figure advisory accidentals are shown in the second bar for G and D, 
confirming that they are not sharpened (unlike their corresponding notes 
in the opening key of F# major). 

Results 

We examine harmonic expectations in the context of 4 -part chord 
sequences with arpeggiated continuations (3 single notes) that out- 
line a new tonal centre (or 'key': see Figures 1 & 2) challenging that of 
the chords. Note that, by design, we do not modulate with trans- 
itional material (e.g. a cadence) from one key to another; our purpose 
was to introduce levels of instability in the stimulus. Following pre- 
sentation of the stimulus (chord and arpeggio portions), a probe tone 
is presented, and participants rate the goodness of fit of this probe to 
the stimulus. In a previous study^^, probe tones were presented after 
500 ms, and their ratings suggest that listeners were influenced by the 
temporal course of a change (modulation) from one key to another: 
with no delay to probe presentation, expectations reflected those of 
the final key. Here we determine whether this pattern remains after 
intervening periods of silence, or whether a process of reappraisal 
occurs whereby expectations increasingly fit the stimulus opening or 
the stimulus as a whole as the probe presentation is delayed. 

Evidence for the estabhshment of expectations formed by the most 
recent musical context (the arpeggio) was found with two kinds of 
stimulus. One was Very unstable' harmonically (stimulus 1, Figure 1: 
all three notes of the concluding arpeggio are foreign to the starting 
key), the other 'less unstable' (stimulus 2, Figure 2: the first arpeggio 
note is intrinsic to the starting key, the second and third foreign; see 
Methods). For the very unstable stimulus, mean ratings of the probe 
tone were correlated with the pitch proximity (in semitones) from 
the last note of the preceding stimulus context (i.e., of the arpeggio). 
This correlation was significant regardless of the intervening time 
between the stimulus context and probe (immediate, r = .67, p = .02; 
1.8 s, r = .80,p = .002; 6 s, r = .68,^ = .01; 19.2 s, r = .70,p = .01). 
For the less unstable stimulus, mean probe tone ratings correlated 
with those documented for the tonality^^ of the key of the final 
arpeggio of the stimulus (not the opening tonality). Again, this cor- 
relation was significant at every time point (immediate, r = .58, p = 
.05; 1.8 s, r = .66, p = .03; 6 s, r = .66, p = .03; 19.2 s, r = .64, p = 



Figure 2 | The 'Less Unstable' stimulus. Sample stimulus comprising 4- 
part chords in G major with an arpeggio continuation outlining a new 
tonal centre in F# major. This is less unstable as the final arpeggio has a 
mean information content (based on IDyOM) of 4.29. 
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Figure 3 | Distribution of probe ratings following the More Unstable 
stimulus at each probe delay. Mean ratings of the goodness of fit of the 
probe tone to the unstable stimulus. 1 - the fit is very bad, 7 the fit is very 
good. Error bars represent the standard deviation. Labels along the x axis 
describe the tonal relationship of the probe tone to the initial (chordal) 
tonality (or in brackets the number of semitones separating them), as 
follows: tonic - root of the scale (0), m2 - minor second (1), M2 - 
major second (2), m3 - minor third (3), M3 - major third (4), P4 - perfect 
fourth (5), d5- diminished fifth (6), P5- perfect fifth (7), m6- minor sixth 
(8), M6 - major sixth (9), m7 - minor seventh (10), M7 - major 
seventh (11). 
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.03), showing that the terminating short-term context induced a 
relatively unambiguous tonal centre in the listeners. 

The most interesting results demonstrate that probe tone ratings 
changed with probe delay. A significant interaction between probe 
delay and probe pitch was found in a repeated measures ANOVA of 
mean probe ratings, across both more and less unstable stimuli: 
F(27.2, 408.3) = 1.75, p = .012 (with Huynh-Feldt correction). 
With the more unstable stimulus, this retrospective change led to a 
correlation between the distribution of probe ratings (see Figures 3 
and 4) and the pitch probabilities estimated by the statistical model 
(IDyoM, see^) on the basis not only of the final arpeggio but also of 
the initial tonality of the stimulus context (established by the opening 
chords). This occurred when the probe presentation was delayed by 
6 s(r= .61, p = .04) or 19.2 s(r=.65,p = .03), but not with delays of 
only 0 or 1.8 s. Therefore, after a very unstable stimulus, listeners' 
judgements become increasingly consistent with the overall prob- 
abilistic structure of the music with increasing time. No correlation 
between probability structures involving the initial tonality and 
probe ratings was found for the less unstable stimuli, though the 
ratings changed with time. 



Figure 3 (0 s) shows the highest fit (rating 6) perceived in the 
immediate responses was the major seventh (unlikely were there only 
the initial tonality without the melodic challenge). By 1.8 s the high- 
est fit, with a lower rating is at the perfect 5*^; and by the later times 
the ratings distribution is flattened, and no values exceed 5. Further 
analysis was necessary to determine the nature and significance of 
these changes. If changes in probe rating with delay merely repre- 
sented sensory decay, their distribution would collapse toward the 
scale mid-point of '4' (i.e., where the fit of the probe is "neutral"). 
Response variability did decrease as the delay to the probe presenta- 
tion increased, as evidenced by an effect of probe delay in a one-way 
repeated measures ANOVA of the standard deviation of probe tone 
ratings: more unstable stimulus, F(3, 45) = 4.31, p = .009; less 
unstable stimulus, F(3, 45) = 4.70, p = .006. However, Cramer tests 
for the equality of distributions showed that the distribution of probe 
ratings was always significantly different from this scale mid-point 
(immediate, Cramer Statistic = 2.34, p = .005; 1.8 s. Statistic = 1.47, 
p = .003; 6 s. Statistic = 0.97, p = .002; 19.2 s. Statistic = 2.03, p = 
.005). Thus for both stimuU, the process of change is better explained 
as an impact of consolidation than of decay. 
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Figure 4 | Comparison of Perceptual Ratings (4a), IDyoM probabilities (4b), and Krumhansl-Kessler Frequencies (4c) for the More Unstable 
Stimulus at 19.2 s delay. All values were z-standardised (that is, expressed in terms of the standard deviation of the value set, becoming both positive and 
negative in relation to the mean) so that they can be viewed on the same scale. The upper panel (a) shows the perceptual ratings; the middle panel (b) the 
IDyoM probabilities; and the bottom panel (c) the Krumhansl-Kessler pitch frequencies for tonal music in a major key (as here). The vertical lines indicate 
the major peaks in the ratings: convergent for the Perceptual and IDyoM values (panels a, b: 2 peak values), and divergent in the third panel 
(c: Krumhansl-Kessler frequencies: 3 peak values). Labels along the x-axis describe the tonal relationship/semitone distance of the pitch to the 
opening tonal centre (as for Fig. 3). 
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The additional further analysis illustrated in Figure 4 uses z- 
standardised values to compare on the same scale the somewhat 
flatter perceptual values with the Krumhansl-Kessler frequencies 
and with the IDyoM probabilities. Figure 4 (top two panels) also 
shows that the two most preferred pitches after 19.2 seconds of 
appraisal of the more unstable stimulus were 5 and 8 semitones from 
the opening tone centre, in agreement with the probabilistic model. 
In contrast, the Krumhansl-Kessler^^ pitch frequencies shown in the 
third panel are quite different, emphasising intervals of 0, 4 and 7 
semitones. This perceptual resolution and preference is similar to 
that of an optimal Bayesian decision maker performing a task based 
on a fixed amount of perceptual evidence, as developed for example 
in powerful models of continuous speech recognition such as 
Shortlist B and Merge^^. Discussion of the presently more limited 
study of such Bayesian models in the context of musical pitch 
sequence predictions is given by Temperley^^. 

Discussion 

We argue that the observed change in judgement with time reflects 
reappraisal of the prior stimulus taken as a whole, which could also be 
construed as memory consolidation, extracting generalities from the 
information. The process could not be that of 'recoding' enforced by 
further incoming related musical events^^, because the delays were 
occupied by silence. We note that from a musicological perspective, 
the more unstable stimulus (Figure 1) has a codified form, where the 
melodic triad is a semitone above the opening tonic (home key): this 
is one kind of 'Neapolitan' harmony. This might make one think that 
it is familiar and hence stable. However, codification is not the same 
as either familiarity or 'stability', nor statistical probability; we 
identify stability as the latter and determine it quantitatively. 

We conclude in accord with Huron's ITPRA (and with an optimal 
Bayesian process) that retrospective appraisal influences perceptions 
of musical fit (and this is not solely forgetting^ ^). Stimulus properties 
and the stability of their mental representation interact with the 
shifting range of potential cognitive operations. Our results suggest 
that very unstable stimuli are likely to be progressively and then 
integrally reappraised, more stable stimuli less so. This result has 
potentially broad implications for behavioural therapy and learning 
processes in that unfamiliar stimuli (which are hence information 
rich) maybe used to influence the statistical interpretation of familiar 
stimuli with which they can be seen to relate, and in turn the unfa- 
miliar stimuli can be integrated within the resultant learned statist- 
ical field. 



preceding context, using a 7-point scale ranging from '1 - the fit is very bad' to '7 - the 
fit is very good'. The pitch relationship of the probe to the starting tonal centre was 
manipulated. 96 trials were presented (2 stimuli X 4 probe delays X 12 probe pitches), 
with a mask of 16 randomly pitched 'Shepard tones' (125 ms tone durations) sepa- 
rating each trial. To avoid serial order effects, each trial was presented in a different 
starting key from the preceding and succeeding trials, selected at random. 

The experiments were approved by the human ethics committee of the University 
of Western Sydney, and informed consent was obtained from all subjects. 
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Methods 

Four male and 12 female non-musicians studying undergraduate psychology (mean 
age 20.5 years, range 17-37) participated for course credit. Each stimulus consisted of 
a sequence of five chords in one tonality followed by three monophonic notes forming 
an arpeggio in another key a semitone away (inter-onset interval 600 ms throughout). 
The 'less unstable' stimulus commenced in G major and ended with an F# major triad 
(though no modulation was cadentially confirmed), while the corresponding 'more 
unstable' stimulus moved from F# major to G major. Importantly, each stimulus was 
presented in all twelve possible keys. The information content (IC)'* of the arpeggio of 
the melodic triad in the context of the opening key, confirmed the much greater 
unexpectedness of the arpeggio in the more unstable stimulus (more unstable: Mean 
IC = 5.64; less unstable: Mean IC = 4.29). Following the stimulus, a probe tone (taken 
from each of the 12 pitches of the Western chromatic scale^^) was presented imme- 
diately, 1.8 s, 6 s, or 19.2 s later by Max/MSP using a piano timbre. Participants 
listening over closed headphones rapidly rated the goodness of fit of the probe to the 
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